Method of joint decoding of possibly multilated code words

ABSTRACT

The invention relates to a method of decoding possibly multilated code words (r) of a code (C), wherein an information word (m) and an address word (a) are encoded into a code word (c) of said code (C) using a generator matrix (G) and wherein said address words (a) are selected such that address words (a) having a known relationship are assigned to consecutive code words (c). To provide a reliable way of decoding making use of the known relationship, a method comprising the following steps is proposed: decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (rib, ri+1) to obtain estimates (u, v) for the differences of the corresponding pairs of code words (ci, ci+1), combining said estimates (u, v) to obtain a number (L) of at least two corrupted versions (wj) of a particular code word (c), forming a code vector (z) from said number (L) of corrupted versions (wj) of said particular code word (c) in each coordinate, decoding said code vector (z) to a decoded code word (c′) in said code (C), and—using said generator matrix (G) to obtain the information word (m) and the address word (a) embedded in said decoded code word (c′).

The invention relates to a method of decoding possibly mutilated code words of a code, wherein an information word and an address word are encoded into a code word of said code using a generator matrix and wherein said address words are selected such that address words having a known relationship are assigned to consecutive code words. The invention relates further to a corresponding apparatus for decoding possibly mutilated code words and to a computer program for implementing said method.

In European patent application 01201841.2 (PHNL 10331), the concept of coding for informed decoding is described. An enhancement can be found in European patent application 01 203 147.2 (PH-NL 010600). The key element of the inventions described in said European patent applications is an appropriate selection of the mapping of information strings towards code words from the Error Correcting Code (ECC) that is applied. The aim of coding for informed decoders is to enable more reliable information retrieval if the decoder a priori knows part of the encoded information. A typical example is in the field of address retrieval of optical media. In case of a forced jump to a certain sector, part of the address of the sector in which the read/write head will land is known taking the jump accuracy into account. For instance in DVR (Digital Video Recording) it is anticipated that on the spiral track on the optical disc, a series of messages is encoded which aids in the logical positioning of the read/write head above the disc, which e.g. contains copyright information, date information and a logical track counter. Of this information, only the logical track counter changes in between successive messages. Therefore, a priori known information can be available from successful decoding of previous messages.

According to the solutions described in the above mentioned European patent applications, decoding of a second message can only gain from the decoding of the first message, if said first decoding is successful. A second decoding can than in turn aid in the decoding of the third message and so on. However, if the decoding of the first message fails, the fact that the second message has been encoded in a special way does not improve the error correcting capabilities, i.e. the second and any further decoding is not helped by previous decodings. All decodings could then fail. In other situations, for example at the very start of a recording or playback session, not much is known about the actual landing place of the read/write head. In that case, only the error correcting capabilities of the code can be used for retrieving information.

It is therefore an object of the present invention to provide an improved method and apparatus of decoding possibly mutilated code words which can be used if the error correcting capabilities of the code are insufficient to allow reliable information retrieval and which can also be used if there are so many errors that, despite application of informed decoding as described in the above mentioned European patent applications, these errors can not be corrected.

This object is achieved according to the present invention by a method of decoding as claimed in claim 1, comprising the steps of:

-   -   decoding the differences of a number of pairs of possibly         mutilated code words to obtain estimates for the differences of         the corresponding pairs of code words,     -   combining said estimates to obtain a number of at least two         corrupted versions of a particular code word,     -   forming a code vector from said number of corrupted versions of         said particular code word in each coordinate,     -   decoding said code vector to a decoded code word in said code,         and     -   using said generator matrix to obtain the information word and         the address word embedded in said decoded code word.

A corresponding apparatus for decoding according to the present invention is claimed in claim 11. A computer program comprising computer program code means for causing a computer to perform the steps of the method as claimed in claim 1 when that computer program is run on said computer is claimed in claim 12. Preferred embodiments of the invention are defined in the dependent claims.

The present invention is based on the idea to employ certain relationships between consecutive code words and to jointly decode several such consecutive code words. “Consecutive” code words in this context shall mean code words which are read and/or inputted into the decoder subsequently, e.g. code words which are located in series next to each other in a data stream or which are stored in consecutive sectors on an information carrier such as a CD, DVD or DVR or magnetic disc.

The proposed method of joint decoding comprises two main elements. A first main element comprises the step of obtaining estimates for the differences of various pairs of consecutive code words by decoding the differences of their corrupted versions. According to a second main element the decoding results of said differences of corrupted pairs of consecutive code words are combined resulting in a number of corrupted versions of one and the same code word. These corrupted versions then are all used for obtaining the desired code word from which finally the information word and the address word can be retrieved. Since the address words encoded in the code words have a certain known relationship, much can be said during decoding about the difference of possibly mutilated code words which knowledge is advantageously used according to the invention during decoding.

Preferred embodiments of the step of forming the code vector are defined in claims 2 to 4. Preferably, said code vector is formed by majority voting in each coordinate from the number of corrupted versions obtained from the estimates. If more than one value occurs most frequent among said number of corrected versions the corresponding coordinate of said code vector is erased according to a further preferred embodiment. Alternatively or in addition, reliability information available on the symbols of one or more possibly mutilated code words is used for selecting the coordinates of said code vector according to still another preferred embodiment. Reliability information could be information about the probability that a certain value is correct. If available, it is preferred to include reliability information for each of the bits of the code vector for enhancing its decoding.

A preferred way of obtaining an estimate for the difference of a pair of code words is to decode the difference of the corresponding pair of possibly mutilated code words to the closest code word from a subcode consisting of all possible differences of two consecutive code words of the main code which closest code word is then used as said estimate.

The probability of incorrect decoding can be further reduced by introducing an extra check according to which the obtained estimates are checked if they show a predetermined form and/or have a possible value. Preferably, if this check fails, the decoding result should be rejected.

The present invention is advantageously used for decoding of code words stored on an information carrier, such as an optical or magnetic disc. According to standards used for optical recording, such as the CD-DA or the DVR standard, address words assigned to consecutive code words are consecutive, e.g. are subsequently increased by one, and preferably represent the sector address of the sector in which the corresponding code word is stored. Said relationship between the address words assigned to consecutive code words will be exploited by the present invention.

For reducing decoding delay, it is advantageous if the number of pairs of possibly mutilated code words to be decoded is as small as possible. However, at least two pairs of two consecutive possibly mutilated code words have to be decoded so that at least three estimates can be obtained since particularly majority voting on two alternatives is not useful. If reliability information is used instead of majority voting one pair might be sufficient.

According to another aspect of the present invention it is proposed that in said step of combining said estimates to obtain a number of corrupted versions of a particular code word a first corrupted version corresponds to a first possibly mutilated code word, a second corrupted version corresponds to the difference between a second possibly mutilated code word and a first estimate, obtained by decoding the difference between said first and said second possibly mutilated code words, and a third corrupted version corresponds to the difference between a third possibly mutilated code word, said first estimate and a second estimate, obtained by decoding the difference between said second and said third possibly mutilated code words.

According to still another aspect of the invention the proposed solution could also be used in combination with the solutions described in the above mentioned European patent applications using a priori known information during decoding. A preferred way could be that, in a first step, the decoder tries to decode a possibly mutilated code word using a priori known information available on the address word embedded in said possibly mutilated code word. If this decoding fails or if the result is not reliable enough, then the present solution of joint decoding could be used. However, it is also possible that the present solution is always used in addition to the use of a priori known information during decoding.

The invention will now be explained more in detail with reference to the drawings, in which

FIG. 1 shows the format of a data word information to be encoded,

FIG. 2 shows the format of a code word,

FIG. 3 shows a block diagram of an encoding and decoding scheme, and

FIG. 4 shows a block diagram of the method of decoding according to the present invention.

In the following, it is assumed that information is represented as a k-bits string called data word d as shown in FIG. 1, said data word d comprising an address word a(i) and an information word m. The address word a(i) is a b-bits string representing the sector address for sector i; the information word m is a (k−b)-bits string containing any information to be stored, such as audio, video, software, copyright or date information or any other kind of data. It should be noted that the address word a(i) is known if i is known and vice versa; however, knowledge of i does not give information on the information word m.

The data word d shown in FIG. 1 is encoded using a k x n binary generator matrix G so that the data word d=(a(i), m) is mapped on the n-bits code word c(i, m)=(a(i), m)G as shown in FIG. 2.

FIG. 3 shows a block diagram of a typical system using encoding and decoding. Therein user data, e.g. audio or video data, coming from a data source 1, e.g. recorded on a master tape or master disc, are encoded before they are stored on a data carrier, e.g. a disc, or transmitted over a transmission channel, e.g. over the internet, before they are again decoded for forwarding them to a data sink 9, e.g. for replaying them.

The user data of the source 1 are first encoded by a source encoder 2, then error correction encoded by an ECC encoder 3 and thereafter modulated by a modulator 4, e.g. an EFM modulator, before the encoded user data—the code words—are put on the channel 5 on which errors may be introduced into the code words. The term “channel” 5 shall here interpreted broadly, including a transmission channel as well as storage of the encoded data on a data carrier for a later replay.

When replay of data is intended, the encoded data first have to be demodulated by a demodulator 6, e.g. an EFM demodulator, before they are error correction decoded by an ECC decoder 7 and source decoded by a source decoder 8. Finally, the decoded user data can be input to the sink 9, e.g. a player device for replay of the user data.

The method of decoding according to the present invention shall be explained more in detail with reference to FIG. 4. It shall be assumed that data stored in encoded form on the record carrier 10 shall be replayed. In a first step, an amount of data r are read by a reading unit 11 and forwarded to an encoding apparatus 12. During its way from the encoder to the decoder errors might be introduced into code words, e.g. by scratches on an optical record carrier or by transmission errors, so that the read code words r are possibly mutilated. Those errors shall be corrected by the decoder 12.

At first, the difference D of code words situated in consecutive sectors (or located consecutively in a transmitted data stream) is computed in unit 13. The difference D of code words r_(i) and r_(i+1) in the sectors i and i+1 is computed as follows: c(i, m ₁)⊕c(i+1, m ₂)=(Δ(i), m₁ ⊕m ₂)G, wherein Δ(i)=a(i)⊕a(i+1) for 0≦i≦2^(b)−2. The key observation is that for appropriate choices of the address word a, much can be said of the difference Δ(i) of two consecutive address words. It should be noted that ⊕ indicates a modulo −2 operation in which additions and subtractions have the same result.

Assuming that corrupted versions r₁, r₂, r₃ of L=3 consecutive code words c₁, c₂, c₃ are read one can write for j=1, 2, 3: r _(j) =c(i+j−1, m _(j))⊕e _(j)=(a(i), m _(j))G⊕e _(j), e_(j) representing an error vector. It is clear that D₁₂=r₁⊕r₂=(a(i), m₁)G⊕(a(i+1), m₂)G⊕(e₁⊕e₂)=(Δ(i), m₁⊕m₂)G⊕(e₁⊕e₂) and D₂₃=r₂⊕r₃=(Δ(i+1), m₂⊕m₃)G⊕(e₂⊕e₃). These L−1=2 differences D₁₂ and D₂₃ are computed by unit 13 and inputted into the first decoding unit 14.

In the decoding unit 14 said differences D₁₂, D₂₃ are decoded each to the closest code word from a subcode C′ which consists of all possible differences of two consecutive code words c of the main code C. This could e.g. be done by comparing the differences D₁₂, D₂₃ with all possible code words of the subcode C′ and by selecting the closest code word as estimate u for c(i, m₁)⊕c(i+1, m₂) and as estimate v for c(i+1, m₂)⊕c(i+2, m₃). Thus, in the first decoding unit 14 estimates u, v for the differences of the pairs of code words c₁⊕c₂ and c₂⊕c₃ corresponding to the pairs of possibly mutilated code words r₁⊕r₂ and r₂⊕r₃ are obtained.

These estimates u, v are combined in unit 15 by computing w ₁ :=r ₁ =c(i, m ₁)⊕e ₁ w ₂ :=r ₂ ⊕u=r ₁⊕(r ₁ ⊕r ₂ ⊕u) W ₃ :=r ₃ ⊕u⊕v=r ₁⊕(r ₁ ⊕r ₂ ⊕u)⊕(r ₂ ⊕r ₃ ⊕v). If the estimate u is correct, then r₁⊕r₂⊕u=e₁⊕e₂. Similarly, if v is correct, then r₂⊕r₃⊕v=e₂⊕e₃. Hence, if the estimates u and v both are correct, then w ₁ =c(i, m ₁)⊕e ₁ w ₂ =c(i, m ₁)⊕e ₂ w ₃ =c(i, m ₁)⊕e₃. In combining unit 15 a number L=3 of corrupted versions w₁, w₂, w₃ of the particular code word c₁=c(i, m₁) are thus obtained.

Next, in unit 16 next the code vector z is constructed by component-wise majority voting of the corrupted versions w₁, w₂, w₃ of the code word c₁. That is, for each i ε {1, 2, . . . , n}, the i-th component z_(i) of the code vector z is an erasure, if w_(1i), w_(2i) and w_(3i) are distinct; otherwise, the component z_(i) equals the most frequent element among w_(1i), w_(2i), w_(3i). The code vector z is then decoded by a second decoding unit 17 for the code C that decodes it into a code word c′ of said code C. Finally, in unit 18 the generator matrix G, which had been used by the encoder to encode the address words and the information words into code words, is used to finally retrieve the information word m and the address word a embedded in said code word c′.

In general, a number of corrupted versions of L consecutive code words are read, say r_(j)=c(i+j−1, m_(j))⊕e_(j) for j=1, 2, . . . , L. Estimates for the differences of each of the (L−1) pairs of consecutive code words are obtained. By combining these estimates, L corrupted versions w₁, w₂, . . . , w_(L) of the code word c₁=c (i, m₁) are obtained. If all the estimates are correct, then it holds w_(j)=c(i, m₁)⊕e_(j) for j=1, 2, . . . , L. The code vector z is obtained as the majority vote of w₁, . . . , w_(L) in each of the coordinates. If in a certain coordinate more than one symbol occurs most frequent, this coordinate in the code vector z is erased. Finally, the code vector z is decoded to a code word in the code C.

For reducing decoding delay, it is advantageous if the number L is as small as possible. With L=2, the described method is not appropriate, as majority voting on two alternatives is not useful. If reliability information, also known as soft decision information, is available on the bits of the possibly mutilated code words r₁ and r₂, reliability information for each of the bits of w₂:=r₂⊕u can be obtained according to a well-known method. The code vector z can now, instead from majority voting, be obtained by replacing setting the coordinates z_(i) to the most reliable of the bits from r_(1i) and w_(2i). For enhancing the decoding of the code vector z reliability information could be included for each of the bits of the code vector z. The reliability information for bit i in z is obtained by combining the reliability information of the bits r_(1i) and w_(2i).

Next, two special cases shall be briefly discussed. In a first special case, it is considered that the information word a(i) is the conventional k-bits binary representation of the integer i. For example, if k=8, then a(57)=00111001, as 57=0·2⁷0·2⁶+1·2⁵+1·2⁴+1·2³+0·2²+0·2¹+1·2⁰. The binary representation of two consecutive integers nearly always has the same leftmost bit; the only exception is the address 011 . . . 1 that has 10 . . . 0 as successor. Therefore it can be assumed, with only a very small probability of being wrong, that the leftmost bit of the difference of two consecutive address words Δ(i):=a(i)⊕a(i+1) equals zero. More generally, it will be shown that it is very likely that Δ(i) starts with many zeros. In other words, a solution as proposed in the above mentioned European patent applications EP 01 201 841.2 and EP . . . of using a priori known information in the decoder can be applied since it is known that a lot of the leftmost information bits of Δ(i) are (with a high probability) equal to zero.

Let i be an integer between 0 and 2^(b−2). Let j be the number of ones in which (a)i ends, and further 0≦j≦b−1. One can write (a)i=s01^(j), where s has length b−j−1, and 1^(j) denotes a string of j ones. It can be easily gathered that a(i+1)=s10^(j), and Δ(i)=a(i+1)⊕a(i)=0^(b−j−1)1^(j+1).

The following conclusions can be drawn:

-   a) for each iε{0, 1, . . . , 2^(b)−2}, Δ(i) is of the form     0^(b−m)1^(m) for some mε{(1, 2, . . . , b}. -   b) Δ(i)=0^(b−m)1^(m) if and only if a(i) ends in (m−1) ones.

From conclusion b) it follows that the number of integers i for which Δ(i) starts with b−m zeros and ends in m ones equals 2^(b−m). Stated differently, for m≧1, the fraction of integers iε{0, 1, . . . , 2^(b)−2} for which Δ(i) ends in exactly m ones equals 2^(b−m)/(2^(b)−1)≈(½)^(m).

For example, the fraction of integers i for which Δ(i) ends in at most 4 ones approximately equals ½+¼+⅛+{fraction (1/16)}={fraction (15/16)}=0.9375. That is, if it is assumed that Δ(i) starts with b−4 zeros, one is correct in nearly 94% of the cases. If it is assumed that Δ(i) starts with b−8 zeros, the minimum Hamming distance drops when the idea of using a prior known information symbols in the decoder, but the assumption is correct with a much larger probability of 255/256≈0.9961.

After decoding a corrupted version of (Δ(i), m₁⊕m₂)G, i.e. after decoding the difference D₁₂, it should be checked if the purported value for Δ(i) is of the form 0^(n−m)1^(m) for some m≦1. If not, the decoding result should be rejected. This extra check greatly reduces the probability of incorrect decoding, e.g. copyright information.

According to another special case all sectors have the same information word m. The difference between the code words for the sectors i and i+1 can be computed as follows c(i, m)⊕c(i+1, m)=(Δ(i), 0)G.

In other words, the difference of any two consecutive code words is in the code C_(Δ), defined as C _(Δ)={(Δ(i), 0)G|0≦i≦2^(b)−2}. The difference of corrupted versions of two consecutive code words can therefore be decoded to the code C_(Δ). The code C_(Δ), which is a subcode of the main code C, has small cardinality if the set of difference vectors {Δ(i)|0≦i≦2^(b)−2} has small cardinality, and in that case, its minimum distance may well exceed the minimum distance of the code C.

In the special case that the address word a(i) is the conventional binary representation of i, it holds C _(Δ)={(0^(i), 1^(b−i), 0^(k−b))G|0≦i≦b−1}. Consequently, the subcode C_(Δ) only contains b words, but not 2^(b)−1 words, and optimal decoding can easily be performed by comparing the difference of two consecutive possibly mutilated words r with all b words from the subcode C_(Δ). It should be noted that the number of comparisons to be made is linear, not exponential, in b.

Since Δ(i) does usually not end in many ones, it might be considered to decode to an even smaller subcode, namely C′ ₆₆={(0^(i), 1^(b−i), 0^(k−b))G|b′≦i≦b−1} where b′ is an integer between 0 and b−1. The larger b′, the smaller C′_(Δ), but also the smaller the likelihood of correct decoding.

Also in case that the address representation a corresponds to binary Gray encoding, the subcode C_(Δ) only has b elements. This is because, by definition, binary Gray encoding means that two consecutive addresses only differ in one position, that is, for each i, Δ(i) consists of one 1 and b−1 zeros.

The present invention constitutes an effective and reliable method for retrieving information stored in code words situated in several consecutive sectors or transmitted subsequently in a data stream. It employs certain relationships between consecutive code words and jointly decodes several such consecutive code words. The present solution can be applied in any encoding and decoding system where address words having a known relationship are assigned to consecutive code words. 

1. Method of decoding possibly mutilated code words (r) of a code (C), wherein an information word (m) and an address word (a) are encoded into a code word (c) of said code (C) using a generator matrix (G) and wherein said address words (a) are selected such that address words (a) having a known relationship are assigned to consecutive code words (c), said method comprising the steps of: decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r_(i), r_(i+1)) to obtain estimates (u, v) for the differences of the corresponding pairs of code words (c₁, c_(i+1)), combining said estimates (u, v) to obtain a number (L) of at least two corrupted versions (w_(j)) of a particular code word (c), forming a code vector (z) from said number (L) of corrupted versions (w_(j)) of said particular code word (c) in each coordinate, decoding said code vector (z) to a decoded code word (c′) in said code (C), and using said generator matrix (G) to obtain the information word (m) and the address word (a) embedded in said decoded code word (c′).
 2. Method as claimed in claim 1, wherein the step of forming said code vector (z) is performed by majority voting.
 3. Method as claimed in claim 2, wherein in the step of forming said code vector (z) a coordinate of said code vector (z) is erased if more than one value occurs most frequent among said number (L) of corrupted versions (w_(j)) of said particular code word (c).
 4. Method as claimed in claim 1, wherein in the step of forming said code vector (z) reliability information available on the symbols of one or more possibly mutilated code words (r) is used for selecting the coordinates of said code vector (z).
 5. Method as claimed in claim 1, wherein in the step of decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r) the difference of a pair of possibly mutilated code words (r_(i), r_(i+1)) is decoded to the closest code word from a subcode (C′) consisting of all possible differences of two consecutive code words (c) of the code (C), said closest code word being used as an estimate (u).
 6. Method as claimed in claim 1, further comprising after the step of decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r) to obtain said estimates (u, v) a step of checking if said estimates (u, v) show a predetermined form and/or have a possible value.
 7. Method as claimed in claim 1, wherein said address words (a) assigned to consecutive code words (c) are consecutive, in particular consecutive sector addresses of sectors of an information carrier storing said code words (c).
 8. Method as claimed in claim 1, wherein in said step of decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r) at least two pairs of two consecutive possibly mutilated code words (r) are decoded.
 9. Method as claimed in claim 1, wherein in said step of combining said estimates (u, v) to obtain an number (L) of corrupted versions (w_(j)) of a particular code word (c) a first corrupted version (w₁) corresponds to a first possibly mutilated code word (r₁), a second corrupted version (w₂) corresponds to the difference between a second possibly mutilated code word (r₂) and a first estimate (u), obtained by decoding the difference between said first and said second possibly mutilated code words (r₁, r₂), and a third corrupted version (w₃) corresponds to the difference between a third possibly mutilated code word (r₃), said first estimate (u) and a second estimate (v), obtained by decoding the difference between said second and said third possibly mutilated code words (r₂, r₃).
 10. Method as claimed in claim 1, further comprising a step of using a priori known information of address word (a) embedded in said possibly mutilated code word (r) to decode said code word (r) before decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r).
 11. Apparatus for decoding possibly mutilated code words (r) of a code (C), wherein an information word (m) and an address word (a) are encoded into a code word (c) of said code (C) using a generator matrix (G) and wherein said address words (a) are selected such that address words (a) having a known relationship are assigned to consecutive code words (c), said apparatus comprising: first decoding means for decoding the differences (D) of a number (L−1) of pairs of possibly mutilated code words (r_(i), r_(i+1)) to obtain estimates (u, v) for the differences of the corresponding pairs of code words (c_(i), c_(i+1)), combining means for combining said estimates (u, v) to obtain a number (L) of at least two corrupted versions (w_(j)) of a particular code word (c), forming means for forming a code vector (z) from said number (L) of corrupted versions (w_(j)) of said particular code word (c) in each coordinate, second decoding means for decoding said code vector (z) to a decoded code word (c′) in said code (C), and use means for using said generator matrix (G) to obtain the information word (m) and the address word (a) embedded in said decoded code word (c′).
 12. Computer program comprising computer program code means for causing a computer to perform the steps of the method as claimed in claim 1 when said computer program is run on said computer. 